skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Bao, Tianyou"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Post-quantum cryptography (PQC) has drawn significant attention from the hardware design research community, especially on field-programmable gate array (FPGA) platforms. In line with this trend, in this paper, we present a novel FPGA-based PQC design work (CHIRP), i.e.,Compact and high-Performance FPGA implementation of unIfied accelerators forRing-Binary-Learning-with-Errors (RBLWE)-basedPQC, a promising lightweight PQC suited for related applications like Internet-of-Things. The proposed accelerators offer flexibility across the available two security levels, thus expanding their application potential. In total, we presented four distinct hardware accelerators tailored to different performance and resource requirements, ranging from resource-constrained devices to high-throughput applications. Our innovation encompasses three key efforts: (i) we derived four optimized algorithms for RBLWE-ENC’s unified operation (covering the available two security levels), allowing flexible switching of security sizes while boosting calculations; (ii) we then presented the four novel accelerators (CHIRP) targeting FPGA platforms, featuring dedicated hardware structures; (iii) we finally conducted a comprehensive evaluation to validate the efficiency of the proposed accelerators on various FPGA devices. Compared to the existing unified design, the proposed accelerator demonstrated up to 91.4% reduction in area-delay product (ADP) on the Straix-V device. Even when compared with the state-of-the-art single security designs, the proposed accelerator (best version) obtains much better resource usage and ADP performance while unified operation (flexibly switching between two security levels) is considered on both AMD-Xilinx and Intel devices. We anticipate the findings of this research will foster advancements in FPGA implementation techniques for lightweight PQC development. 
    more » « less
    Free, publicly-accessible full text available January 28, 2026
  2. Polynomial multiplication is a key component in many post-quantum cryptography and homomorphic encryption schemes. One recurring variation, ternary polynomial multiplication over ring Zq/(xn+1) where one input polynomial has ternary coefficients {−1,0,1} and the other has large integer coefficients {0, q−1}, has recently drawn significant attention from various communities. Following this trend, this paper presents a novel SCALable and area-Efficient Systolic (SCALES) accelerator for ternary polynomial multiplication. In total, we have carried out three layers of coherent interdependent efforts. First, we have rigorously derived a novel block-processing strategy and algorithm based on the schoolbook method for polynomial multiplication. Then, we have innovatively implemented the proposed algorithm as the SCALES accelerator with the help of a number of field-programmable gate array (FPGA)-oriented optimization techniques. Lastly, we have conducted a thorough implementation analysis to showcase the efficiency of the proposed accelerator. The comparison demonstrated that the SCALES accelerator has at least 19.0% and 23.8% less equivalent area-time product (eATP) than the state-of-the-art designs. We hope this work can stimulate continued research in the field. 
    more » « less
  3. The recently announced National Institute of Standards and Technology (NIST) Post-quantum cryptography (PQC) third-round standardization process has released its candidates to be standardized and FALCON is one of them. On the other hand, however, very few hardware implementation works for FALCON have been released due to its very complicated computation procedure and intensive complexity. With this background, in this paper, we propose an efficient hardware structure to implement residue numeral system (RNS) decomposition within NTRUSolve (a key arithmetic component for key generation of FALCON). In total, we have proposed three stages of coherent interdependent efforts to finish the proposed work. First, we have identified the necessary algorithmic operation related to RNS decomposition. Then, we have innovatively designed a hardware structure to realize these algorithms. Finally, field-programmable gate array (FPGA)-based implementation has been carried out to verify the superior performance of the proposed hardware structure. For instance, the proposed hardware design involves at least 3.91x faster operational time than the software implementation. To the authors’ best knowledge, this is the first paper about the hardware acceleration of RNS decomposition for FALCON, and we hope the outcome of this work will facilitate the research in this area. 
    more » « less
  4. Post-quantum cryptography (PQC) has gained sub-stantial attention from various communities recently. Along with the ongoing National Institute of Standards and Technology (NIST) PQC standardization process that targets the general-purpose PQC algorithms, the research community is also looking for efficient lightweight PQC schemes. Among this direction of efforts, Ring-Binary-Learning-with-Errors (RBLWE)-based encryption scheme (RBLWE-ENC) is regarded as a promising lightweight PQC fitting Internet-of-Things (IoT) and edge computing applications. As hardware implementation for PQC algorithms has become one of the major advances in the field, in this paper, we follow this trend to present an efficient implementation of RBLWE-ENC lightweight accelerator on the field-programmable gate array (FPGA) platform. Overall, we have demonstrated three coherent interdependent stages of efforts: (i) we have presented detailed derivation processes to formulate the proposed algorithmic operation; (ii) we have then implemented the proposed algorithm into a desired hardware accelerator; and (iii) we provided thorough complexity analysis and comparison to showcase the superior performance of the proposed accelerator over the state-of-the-art designs, e.g., the proposed accelerator with v=8 has at least 66.67% less area-time complexities than the existing ones (Virtex-7 FPGA). We hope the outcome of this work can facilitate lightweight PQC development. 
    more » « less
  5. The recent research in post-quantum cryptography (PQC) field has gradually switched to efficient implementation of PQC algorithms on hardware platforms. As polynomial multiplication is typically one of the critical operations within lattice-based PQC, its hardware acceleration has drawn significant attention from the research community recently. We propose a high-speed processing strategy to construct a new High-performance Polynomial Multiplication Accelerator (HPMA) for key encapsulation mechanism (KEM) Saber. Firstly, we have given a detailed mathematical derivation to obtain a low-latency processing algorithm for Saber polynomial multiplication. Then, we have innovatively used the derived the proposed algorithm to construct a new structure HPMA for FPGA implementation. Lastly, we have demonstrated the superior performance of the proposed HPMA-Saber by comparing with state-of-the-art works. The proposed design strategy is highly efficient and the obtained results can be useful for the PQC research community. 
    more » « less
  6. Ring-Binary-Learning-with-Errors (RBLWE)-based post-quantum cryptography (PQC) is a promising scheme suitable for lightweight applications. This paper presents an efficient hardware systolic accelerator for RBLWE-based PQC, targeting high-performance applications. We have briefly given the algorithmic background for the proposed design. Then, we have transferred the proposed algorithmic operation into a new systolic accelerator. Lastly, field-programmable gate array (FPGA) implementation results have confirmed the efficiency of the proposed accelerator. 
    more » « less
  7. Following the rapid progress in the post-quantum cryptography (PQC) field that many efforts have been gradually switched to the hardware implementation side, this paper presents a novel systolic accelerator for polynomial multiplication within two lattice-based PQC algorithms, key encapsulation mechanism (KEM) Saber and binary Ring-Learning-with-Errors (BRLWE)-based encryption scheme. Based on the observation that polynomial multiplication over ring is the key arithmetic operation for the two PQC schemes, we have proposed a novel systolic accelerator for the targeted polynomial multiplications (applicable to two PQC schemes). Mathematical formulation is given to illustrate the proposed algorithmic operation for both schemes. Then, the proposed systolic accelerator is presented. Finally, field-programmable gate array (FPGA) implementation results have been provided to confirm the efficiency of the proposed systolic accelerator under two schemes. The proposed accelerator is highly efficient, and the following work may focus on cryptoprocessor design and side-channel attacks. 
    more » « less